COVID-19 Analysis
This short report analyzes some of the data from the repository by Johns Hopkins CCSE related to the COVID-19 pandemic.
Preliminaries
We have developed a simple Python module with a set of functions that can be used to visualize the available data. The module is available in the repository (hedera_covid.py), the data as well. We periodically update the datasets with those available online.
# for plotly
from plotly.offline import iplot
from plotly.offline import init_notebook_mode, plot
from IPython.core.display import display, HTML
import plotly as py
import plotly.tools as tls
import numpy as np
from hedera_covid import DataHandler, plot_death_rate, plot_daily_cases, plot_confirmed_cases
# load data
path_confirmed = '../../Data/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
path_death = '../../Data/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv'
covid_data = DataHandler(data_confirmed_path = path_confirmed,
data_death_path = path_death)
covid_data.confirmed.head()
First of all, we load the data create an object of type DataHandler. This class can be used to perform internally operation on the dataset, so that data are afterwards ready and clean for plotting.
Initialize list of countries
Now, it is possible to get data of a country to the list that we want to look at. This can be done using
my_country = covid_data.get_country('Name of the country')
The Name of the country is the name given in the third column (Country/Region) of the covid_data dataframe.
For example:
italy = covid_data.get_country('Italy')
italy.keys()
We can also create a list of countries
covid_data.add_country('Name of the country')
that we can analyze together.
my_countries = ['Italy','Spain','Germany','Austria','France','United Kingdom','US','Sweden','Netherlands']
for c in my_countries:
covid_data.add_country(c)
Starting from this object, we can now use some functions implemented in our module to visualize the data
Note: (this is an ongoing work!)
data = covid_data.get_confirmed_data(start_date=0,n_smooth=7,rescale=True)
Parameter:
start_dateto start plotting from a particular day (later than January 22)n_smoothto smooth the data (7 is usually good)rescaleif you want to rescale curves to the same start (when the number of infected reached 100 in the corresponding country). In this casestart_datewon't be used.
Then we can create a plotly bar chart (for example) and display it.
init_notebook_mode(connected=True)
fig = {
"data": data,
"layout": {"title": {"text": "Confirmed Cases (rescaled in each country)"}}
}
plot(fig, filename = 'figure.html')
display(HTML('figure.html'))
Daily variation
Under the assumption that the number of reported cases is a representative of the total number in each conuntry, looking at the daily new cases can give an idea of whether the countries are flattening the curve.
For this, we can use a function that gather these data for the selected countries.
Parameters:
start_date: day where the plot start (0 = January 22)n_smooth: smoothing of the data (data will be averaged overn_smoothdays, 7 is usually good)rescale: setTrueif you want to rescale curves to the same start. In this casestart_datewon't be used.
We use rescale = True: this means that the curves start when the number of infected (reported) in each country reached 100.
data = covid_data.get_daily_confirmed_data(start_date=0,n_smooth=7,rescale=True)
init_notebook_mode(connected=True)
fig = {
"data": data,
"layout": {"title": {"text": "Daily Cases (rescaled)"}}
}
plot(fig, filename = 'figure.html')
display(HTML('figure.html'))
data = covid_data.get_death_rate_data(start_date=30,n_smooth=0,rescale=False)
init_notebook_mode(connected=True)
fig = {
"data": data,
"layout": {"title": {"text": "Official mortality rate: # Death/# Confirmed"}}
}
plot(fig, filename = 'figure.html')
display(HTML('figure.html'))
#Parameters:
#* `countries`: a list of countries
#* `start_date`: day where the plot start (0 = January 22)
#* `n_smooth`: smoothing of the data (data will be averaged over `n_smooth` days, 7 is usually good)
#* `rescale`: set `True` if you want to rescale curves to the same *start*. In this case `start_date` won't be used.
#* `log_scale`: set to `True` to scale use logarithmic scale for the *y*-axis
#
#plot_confirmed_cases(covid_data.countries,start_date=30,n_smooth=0,rescale=False,log_scale=True)